A quantitative study of disfluencies in French broadcast interviews
نویسندگان
چکیده
The reported study aims at increasing our understanding of spontaneous speech-related phenomena from sibling corpora of speech and orthographic transcriptions at various levels of elaboration. It makes use of 9 hours of French broadcast interview archives, involving 10 journalists and 10 personalities from political or civil society. First we considered press-oriented transcripts, where most of the socalled disfluencies are discarded. They were then aligned with automatic transcripts, by using the LIMSI speech recogniser. This facilitated the production of exact transcripts, where all audible phenomena in non-overlapping speech segments were transcribed manually. Four types of disfluencies were distinguished: discourse markers, filled pauses, repetitions and revisions, each of which accounts for about 2% of the corpus (8% in total). They were analysed by utterance”, speaker and disfluency pattern types. Four question were raised. Where do disfluencies occur in the utterance? What is the influence of the speakers’ status? And what are the most frequent disfuency patterns?
منابع مشابه
Quantitative study of disfluencies in schizophrenics' speech: Automatize to limit biases (Étude quantitative des disfluences dans le discours de schizophrènes : automatiser pour limiter les biais) [in French]
We present in this article the results of experiments we led concerning disfluencies in the discourse of schizophrenic patients (in remediation). These experiments are part of a larger study dealing with other levels of linguistic analysis, that could eventually help identifying clues leading to the diagnostic of the disease. This study largely relies on natural language processing tools, which...
متن کاملSpeech Overlap and Interplay with Disfluencies in Political Interviews
The reported study focuses on overlapping speech, transcription, annotation and disfluency analysis in an 8-hour audio corpus of French political interviews. Overlaps are frequent (on average 3-4 overlaps per minute) and of short duration (5% of data), non-intrusive overlaps being significantly shorter than intrusive ones. Disfluencies include repetitions, revisions and filled pauses. Manual an...
متن کاملAutomatic detection and annotation of disfluencies in spoken French corpora
In this paper we propose a multi-step system for the semiautomatic detection and annotation of disfluencies in spoken corpora. A set of rules, statistical models and machine learning techniques are applied to the input, which is a transcription aligned to the speech signal. The system uses the results of an automatic estimation of prosodic, part-of-speech and shallow syntactic features. We pres...
متن کاملEffects of Sowing Methods on the Quality and Quantity Traits of Three Annual Medicago Species
ABSTRACT- Annual Medicago species (Medicago spp.) are native to the Mediterranean region and widely used in fields and pastures in Iran. There are several methods of sowing annual Medicago species, each with different effects on the performance. However, there is currently no sufficient information about the appropriate methods for sowing Medicago species. In order to evaluate methods of sowing...
متن کاملThe need to create a media block for the convergence of overseas news networks
As a general diplomacy arm of the Islamic Republic of Iran, VoSiMa has extensive activities in international broadcasting of its radio and television programs. These programs are broadcast in different languages, such as English, French, Azeri, Arabic, and ... for regional and transnational audiences. The large volume of the organization's international activities is in the form of news and new...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005